A CYK+ Variant for SCFG Decoding Without a Dot Chart
نویسنده
چکیده
While CYK+ and Earley-style variants are popular algorithms for decoding unbinarized SCFGs, in particular for syntaxbased Statistical Machine Translation, the algorithms rely on a so-called dot chart which suffers from a high memory consumption. We propose a recursive variant of the CYK+ algorithm that eliminates the dot chart, without incurring an increase in time complexity for SCFG decoding. In an evaluation on a string-totree SMT scenario, we empirically demonstrate substantial improvements in memory consumption and translation speed.
منابع مشابه
Contrasting objective functions for CYK chart decoding
Context-free inference is a standard part of many NLP pipelines. Most approaches use a variant of the CYK dynamic programming algorithm to populate a chart structure with predicted nonterminals over each span. We can extract a parse tree from this chart in several ways. In this work, we compare two commonly-used decoding approaches (Viterbi and max-rule) with a minimum-bayes-risk (MBR) method w...
متن کاملBeam-Width Prediction for Efficient Context-Free Parsing
Efficient decoding for syntactic parsing has become a necessary research area as statistical grammars grow in accuracy and size and as more NLP applications leverage syntactic analyses. We review prior methods for pruning and then present a new framework that unifies their strengths into a single approach. Using a log linear model, we learn the optimal beam-search pruning parameters for each CY...
متن کاملAn Efficient Shift-Reduce Decoding Algorithm for Phrased-Based Machine Translation
In statistical machine translation, decoding without any reordering constraint is an NP-hard problem. Inversion Transduction Grammars (ITGs) exploit linguistic structure and can well balance the needed flexibility against complexity constraints. Currently, translation models with ITG constraints usually employs the cube-time CYK algorithm. In this paper, we present a shift-reduce decoding algor...
متن کاملTechnical Report: An n-free-passes CYK algorithm for error-correction and the prediction of non-canonical base-pairs in RNA secondary structure
Background: The prediction of non-canonical base-pairs in RNA secondary structure prediction has become increasingly important with the advent of next-generation sequencing technologies, where sequencing errors can introduce artificial non-canonical base-pairs in RNA secondary structure. These base-pairs are not appropriately accounted for by the currently existing models. Results: Here we focu...
متن کاملSCFG latent annotation for machine translation
We discuss learning latent annotations for synchronous context-free grammars (SCFG) for the purpose of improving machine translation. We show that learning annotations for nonterminals results in not only more accurate translation, but also faster SCFG decoding.
متن کامل